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In this article, we shall describe some of the most interesting topics in the 
subject of Complexity Science for a general audience. Anyone with a solid 
foundation in high school mathematics (with some calculus) and an elemen- 
tary understanding of computer programming will be able to follow this article. 
First, we shall explain the significance of the P versus NP problem and solve it. 
Next, we shall describe two other famous mathematics problems, the Collatz 
3n + 1 Conjecture and the Riemann Hypothesis, and show how both Chaitin's 
incompleteness theorem and Wolfram's notion of "computational irreducibil- 
ity" are important for understanding why no one has, as of yet, solved these 
two problems. 

Disclaimer: This article was authored by Craig Alan Feinstein in his pri- 
vate capacity. No official support or endorsement by the U.S. Government is 
intended or should be inferred. 
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1 Challenge 

Imagine that you have a collection of one billion lot- 
tery tickets scattered throughout your basement in no 
particular order. An official from the lottery announces 
the number of the winning lottery ticket. For a possible 
prize of one billion dollars, is it a good idea to search 
your basement until you find the winning ticket or until 
you come to the conclusion that you do not possess the 
winning ticket? Most people would think not - even if the 
winning lottery ticket were in your basement, perform- 
ing such a search could take 10 9 /(60 x 60 x 24 x 365.25) 
years, over thirty work-years, assuming that it takes you 
at least one second to examine each lottery ticket. Now 
imagine that you have a collection of only one thousand 
lottery tickets in your basement. Is it a good idea to 
search your basement until you find the winning ticket 
or until you come to the conclusion that you do not pos- 
sess the winning ticket? Most people would think so, 
since doing such would take at most a few hours. 

From these scenarios, let us postulate a general rule 
that the maximum time that it may take for one person 
to search N unsorted objects for one specific object is 
directly proportional to N. This is clearly the case for 
physical objects, but what about abstract objects? For 
instance, let us suppose that a dating service is trying 
to help n single women and n single men to get married. 
Each woman gives the dating service a list of characteris- 
tics that she would like to see in her potential husband, 
for instance, handsome, caring, athletic, domesticated, 
etc. And each man gives the dating service a list of 
characteristics that he would like to see in his potential 



wife, for instance, beautiful, obedient, good cook, thrifty, 
etc. The dating service is faced with the task of arrang- 
ing dates for each of its clients so as to satisfy everyone's 
preferences. 

Now there are n\ (which is shorthand for n X (n— 1) X 
(n— 2) x ... x 2 x 1) possible ways for the dating service to 
arrange dates for each of its clients, but only a fraction 
of such arrangements would satisfy all of its clients. If 
n = 100, it would take too long for the dating service's 
computer to evaluate all 100! possible arrangements un- 
til it finds an arrangement that would satisfy all of its 
clients. (100! is too large a number of possibilities for 
any modern computer to handle.) Is there an efficient 
way for the dating service's computer to find dates with 
compatible potential spouses for each of the dating ser- 
vice's clients so that everyone is happy, assuming that it 
is possible to do such? Yes, and here is how: 

Matchmaker Algorithm - Initialize the set M = 0. 
Search for a list of compatible relationships between men 
and women that alternates between a compatible rela- 
tionship {x\,X2} not contained in set M, followed by 
a compatible relationship {£2, £3} contained in set M, 
followed by a compatible relationship {2:3, 2:4} not con- 
tained in set M, followed by a compatible relationship 
{£4, £5} contained in set M, and so on, ending with a 
compatible relationship {x m —i, x m } not contained in set 
M, where both x\ and x m are not members of any com- 
patible relationships contained in set M. Once such a 
list is found, for each compatible relationship {2^, 2^+1} 
in the list, add {2^,2^+1} to M if {x^Xi+x} is not con- 
tained in M or remove {xj, 2^+1} from M if {xi, 2^+1} is 
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contained in M, (Note that this procedure must increase 
the size of set M by one.) Repeat this procedure until 
no such list exists. 

Such an algorithm is guaranteed to efficiently find an 
arrangement M that will satisfy all of the dating service's 
clients whenever such an arrangement exists |3U| . So we 
see that with regard to abstract objects, it is not neces- 
sarily the case that the maximum time that it may take 
for one to search N unsorted objects for a specific object 
is directly proportional to N; in the dating service ex- 
ample, there are n\ possible arrangements between men 
and women, yet it is not necessary for a computer to 
examine all n! arrangements in order to find a satisfac- 
tory arrangement. One might think that the problem 
of finding a satisfactory dating arrangement is easy for 
a modern computer to solve because the list of pairs of 
men and women who are compatible is relatively small 
(of size at most n 2 , which is much smaller than the num- 
ber of possible arrangements nl) and because it is easy 
to verify whether any particular arrangement will make 
everyone happy. But this reasoning is invalid, as we shall 
demonstrate: 

2 The SUBSET-SUM Problem 

Consider the following problem: You are given a set 
A = {ai, a n } of n integers and another integer b which 
we shall call the target integer. You want to know if there 
exists a subset of A for which the sum of its elements is 
equal to b. (We shall consider the sum of the elements 
of the empty set to be zero.) This problem is called the 
SUBSET-SUM problem HDJ. Now, there are 2™ subsets 
of A, so one could naively solve this problem by exhaus- 
tively comparing the sum of the elements of each subset 
of A to b until one finds a subset-sum equal to 6, but 
such a procedure would be infeasible for even the fastest 
computers in the world to implement when n = 100. Is 
there an algorithm which can considerably reduce the 
amount of work for solving the SUBSET-SUM problem? 
Yes, there is an algorithm discovered by Horowitz and 
Sahni in 1974 [21], which we shall call the Meet-in-the- 
Middle algorithm, that takes on the order of 2™/ 2 steps 
to solve the SUBSET-SUM problem instead of the 2™ 
steps of the naive exhaustive comparison algorithm: 

Meet-in-the-Middle Algorithm - First, partition the 
set A into two subsets, A + = {a%, ttfjj.] } and A~ = 
{op»-| + X) a n}- Let us define S + and S~ as the sets 
of subset-sums of A + and A~, respectively. Sort sets 
S + and b — S~ in ascending order. Compare the first 
elements in both of the lists. If they match, then stop 
and output that there is a solution. If not, then compare 



the greater element with the next element in the other 
list. Continue this process until there is a match, in 
which case there is a solution, or until one of the lists 
runs out of elements, in which case there is no solution. 

This algorithm takes on the order of 2™/ 2 steps, since 
it takes on the order of 2™/ 2 steps to sort sets S + and 
b — S~ (assuming that the computer can sort in linear- 
time) and on the order of 2 n / 2 steps to compare elements 
from the sorted lists S + and b — S~. Are there any faster 
algorithms for solving SUBSET-SUM? 2™/ 2 is still a very 
large number when n — 100, even though this strategy is 
a vast improvement over the naive strategy. It turns out 
that no algorithm with a better worst-case running-time 
has ever been found since the Horowitz and Sahni paper 
|40| . And the reason for this is because it is impossible 
for such an algorithm to exist. Here is an explanation 
why: 

Explanation: To understand why there is no algorithm 
with a faster worst-case running-time than the Meet-in- 
the-Middle algorithm, let us travel back in time seventy- 
five years, long before the internet. If one were to ask 
someone back then what a computer is, one would have 
gotten the answer, "a person who computes (usually a 
woman)" instead of the present day definition, "a ma- 
chine that computes" JBj • Let us imagine that we knew 
two computers back then named Mabel and Mildred (two 
popular names for women in the 1930's [2]). Mabel is 
very efficient at sorting lists of integers into ascending 
order; for instance she can sort a set of ten integers in 
15 seconds, whereas it takes Mildred 20 seconds to per- 
form the same task. However, Mildred is very efficient 
at comparing two integers a and b to determine whether 
a < b or a = b or a > b; she can compare ten pairs of in- 
tegers in 15 seconds, whereas it takes Mabel 20 seconds 
to perform the same task. 

Let's say we were to give both Mabel and Mildred the 
task of determining whether there exists a subset of some 
four element set, A = {a±, 02, &3, 04}, for which the sum 
of its elements adds up to b. Since Mildred is good at 
comparing but not so good at sorting, Mildred chooses to 
solve this problem by comparing b to all of the sixteen 
subset-sums of A. Since Mabel is good at sorting but 
not so good at comparing, Mabel decides to solve this 
problem by using the Meet-in-the-Middle algorithm. In 
fact, of all algorithms that Mabel could have chosen to 
solve this problem, the Meet-in-the-Middle algorithm is 
the most efficient for her to use on sets A with only four 
integers. And of all algorithms that Mildred could have 
chosen to solve this problem, comparing b to all of the 
sixteen subset-sums of A is the most efficient algorithm 
for her to use on sets A with only four integers. 
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Now we are going to use the principle of mathemati- 
cal induction to prove that the best algorithm for Mabel 
to use for solving the SUBSET-SUM problem for large n 
is the Meet-in-the-Middle algorithm: We already know 
that this is true when n = 4. Let us assume that this is 
true for n, i.e., that of all possible algorithms for Mabel 
to use for solving the SUBSET-SUM problem on sets 
with n integers, the Meet-in-the-Middle algorithm has 
the best worst-case running-time. Then we shall prove 
that this is also true for n + 1 : 

Let S be the set of all subset-sums of the set A = 
{ai,a 2 , a n }. Notice that the SUBSET-SUM problem 
on the set A U {a' } of n + 1 integers and target b is equiv- 
alent to the problem of determining whether (1) b G S 
or (2) b' G S (where b 1 = b — a!). (The symbol G means 
"is a member of . ) Also notice that these two subprob- 
lems, (1) and (2), are independent from one another in 
the sense that the values of b and b' are unrelated to 
each other and are also unrelated to set S; therefore, in 
order to determine whether b G S or b' G S, it is neces- 
sary to solve both subproblems (assuming that the first 
subproblem solved has no solution) . So it is clear that if 
Mabel could solve both subproblems in the fastest time 
possible and also whenever possible make use of informa- 
tion obtained from solving subproblem (1) to save time 
solving subproblem (2) and whenever possible make use 
of information obtained from solving subproblem (2) to 
save time solving subproblem (1), then Mabel would be 
able to solve the problem of determining whether (1) 
b G S or (2) b' G S in the fastest time possible [T5|. 

We shall now explain why the Meet-in-the-Middle al- 
gorithm has this characteristic for sets of size n + 1 : It is 
clear that by the induction hypothesis, the Meet-in-the- 
Middle algorithm solves each subproblem in the fastest 
time possible, since it works by applying the Meet-in- 
the-Middle algorithm to each subproblem, without loss 
of generality sorting and comparing elements in lists S + 
and b — S~ and also sorting and comparing elements in 
lists S + and b' — S~ as the algorithm sorts and compares 
elements in lists S + and b — [S~ U (S~ + a')]. There are 
two situations in which it is possible for the Meet-in-the- 
Middle algorithm to make use of information obtained 
from solving subproblem (1) to save time solving sub- 
problem (2) or to make use of information obtained from 
solving subproblem (2) to save time solving subproblem 
(1). And the Meet-in-the-Middle algorithm takes advan- 
tage of both of these opportunities: 

• Whenever the Meet-in-the-Middle algorithm com- 
pares two elements from lists S + and b — S~ and 
the element in list S + turns out to be less than the 
element in list b — S~ , the algorithm makes use of 
information obtained from solving subproblem (1) 
(the fact that the element in list S + is less than 



the element in list b — S~) to save time, when n 
is odd, solving subproblem (2) (the algorithm does 
not consider the element in list S + again). 

• Whenever the Meet-in-the-Middle algorithm com- 
pares two elements from lists S + and b' — S~ and 
the element in list S + turns out to be less than the 
element in list b' — S~ , the algorithm makes use of 
information obtained from solving subproblem (2) 
(the fact that the element in list S + is less than 
the element in list b' — S~) to save time, when n 
is odd, solving subproblem (1) (the algorithm does 
not consider the element in list S + again). 

Therefore, we can conclude that the Meet-in-the-Middle 
algorithm whenever possible makes use of information 
obtained from solving subproblem (1) to save time solv- 
ing subproblem (2) and whenever possible makes use 
of information obtained from solving subproblem (2) to 
save time solving subproblem (1). So we have completed 
our induction step to prove true for n + 1, assuming true 
for n. 

Therefore, the best algorithm for Mabel to use for 
solving the SUBSET-SUM problem for large n is the 
Meet-in-the-Middle algorithm. But is the Meet-in-the- 
Middle algorithm the best algorithm for Mildred to use 
for solving the SUBSET-SUM problem for large n? Since 
the Meet-in-the-Middle algorithm is not the fastest al- 
gorithm for Mildred to use for small n, is it not possible 
that the Meet-in-the-Middle algorithm is also not the 
fastest algorithm for Mildred to use for large nl It turns 
out that for large n, there is no algorithm for Mildred to 
use for solving the SUBSET-SUM problem with a faster 
worst-case running-time than the Meet-in-the-Middle al- 
gorithm. Why? 

Notice that the Meet-in-the-Middle algorithm takes 
on the order of 2 n / 2 steps regardless of whether Mabel 
or Mildred applies it. And notice that the algorithm of 
naively comparing the target b to all of the 2™ subset- 
sums of set A takes on the order of 2™ steps regardless of 
whether Mabel or Mildred applies it. So for large n, re- 
gardless of who the computer is, the Meet-in-the-Middle 
algorithm is faster than the naive exhaustive compari- 
son algorithm - from this example, we can understand 
the general principle that the asymptotic running-time of 
an algorithm does not differ by more than a polynomial 
factor when run on different types of computers |4T)ll4*T] . 
Therefore, since no algorithm is faster than the Meet-in- 
the-Middle algorithm for solving SUBSET-SUM for large 
n when applied by Mabel, we can conclude that no algo- 
rithm is faster than the Meet-in-the-Middle algorithm for 
solving SUBSET-SUM for large n when applied by Mil- 
dred. And furthermore, using this same reasoning, we 
can conclude that no algorithm is faster than the Meet- 
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in-the-Middle algorithm for solving SUBSET-SUM for 
large n when run on a modern computing machine. □ 

So it doesn't matter whether the computer is Mabel, 
Mildred, or any modern computing machine; the fastest 
algorithm which solves the SUBSET-SUM problem for 
large n is the Meet-in-the-Middle algorithm. Because 
once a solution to the SUBSET-SUM problem is found, 
it is easy to verify (in polynomial-time) that it is in- 
deed a solution, we say that the SUBSET-SUM problem 
is in class NP 0. And because there is no algorithm 
which solves SUBSET-SUM that runs in polynomial- 
time (since the Meet-in-the-Middle algorithm runs in 
exponential-time and is the fastest algorithm for solv- 
ing SUBSET-SUM, as we have shown above), we say 
that the SUBSET-SUM problem is not in class P |Sj. 
Then since the SUBSET-SUM problem is in class NP 
but not in class P, we can conclude that P ^ NP, thus 
solving the P versus NP problem ^S]. The solution to 
the P versus NP problem demonstrates that it is possi- 
ble to hide abstract objects (in this subset of set 
A) without an abundance of resources - it is, in general, 
more difficult to find a subset of a set of only one hundred 
integers for which the sum of its elements equals a target 
integer than to find the winning lottery-ticket in a pile 
of one billion unsorted lottery tickets, even though the 
lottery-ticket problem requires much more resources (one 
billion lottery tickets) than the SUBSET-SUM problem 
requires (a list of one hundred integers). 

3 Does P 7^ NP really matter? 

Even though P / NP, might there still be algorithms 
which efficiently solve problems that are in NP but not 
P in the average-case scenario? (Since the P ^ NP 
result deals only with the worst-case scenario, there is 
nothing to forbid this from happening.) The answer 
is yes; for many problems which are in NP but not 
P, there exist algorithms which efficiently solve them 
in the average-case scenario so the statement 

that P ^ NP is not as ominous as it sounds. In fact, 
there is a very clever algorithm which solves almost all 
instances of the SUBSET-SUM problem in polynomial- 
time m EH I2HI • (The algorithm works by converting 
the SUBSET-SUM problem into the problem of finding 
the shortest non-zero vector of a lattice, given its ba- 
sis.) But even though for many problems which are in 
NP but not P, there exist algorithms which efficiently 
solve them in the average-case scenario, in the opinion 
of most complexity-theorists, it is probably false that for 
all problems which are in NP but not P, there exist al- 
gorithms which efficiently solve them in the average-case 
scenario 



Even though P ^ NP, might it still be possible 
that there exist polynomial-time randomized algorithms 
which correctly solve problems in NP but not in P with 
a high probability regardless of the problem instance? 
(The word "randomized" in this context means that the 
algorithm bases some of its decisions upon random vari- 
ables. The advantage of these types of algorithms is that 
whenever they fail to output a solution, there is still a 
good chance that they will succeed if they are run again.) 
The answer is probably no, as there is a widely believed 
conjecture that P — BPP, where BPP is the class of 
decision problems for which there are polynomial-time 
randomized algorithms that correctly solve them at least 
two-thirds of the time regardless of the problem instance 

122- 

4 Are Quantum Computers the 
Answer? 

A quantum computer is any computing device which 
makes direct use of distinctively quantum mechanical 
phenomena, such as superposition and entanglement, to 
perform operations on data. As of today, the field of 
practical quantum computing is still in its infancy; how- 
ever, much is known about the theoretical properties of 
a quantum computer. For instance, quantum comput- 
ers have been shown to efficiently solve certain types of 
problems, like factoring integers [SSj, which are believed 
to be difficult to solve on a classical computer, e.g., a 
human-computer like Mabel or Mildred or a machine- 
computer like an IBM PC or an Apple Macintosh. 

Is it possible that one day quantum computers will 
be built and will solve problems like the SUBSET-SUM 
problem efficiently in polynomial-time? The answer is 
that it is generally suspected by complexity theorists 
to be impossible for a quantum computer to solve the 
SUBSET-SUM problem (and all other problems which 
share a characteristic with the SUBSET-SUM problem 
in that they belong to a subclass of NP problems known 
as NP-complete problems [5]) in polynomial-time. A 
curious fact is that if one were to solve the SUBSET- 
SUM problem on a quantum computer by brute force, 
the algorithm would have a running-time on the order of 
2"/ 2 steps, which (by coincidence?) is the same asymp- 
totic running-time of the fastest algorithm which solves 
SUBSET-SUM on a classical computer, the Meet-in-the- 
Middle algorithm [HHDl]. 

In any case, no one has ever built a practical quantum 
computer and some scientists are even of the opinion that 
building such a computer is impossible; the acclaimed 
complexity theorist, Leonid Levin, wrote: "QC of the 
sort that factors long numbers seems firmly rooted in sci- 
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ence fiction. It is a pity that popular accounts do not dis- 
tinguish it from much more believable ideas, like Quan- 
tum Cryptography, Quantum Communications, and the 
sort of Quantum Computing that deals primarily with 
locality restrictions, such as fast search of long arrays. 
It is worth noting that the reasons why QC must fail 
are by no means clear; they merit thorough investiga- 
tion. The answer may bring much greater benefits to 
the understanding of basic physical concepts than any 
factoring device could ever promise. The present atti- 
tude is analogous to, say, Maxwell selling the Daemon of 
his famous thought experiment as a path to cheaper elec- 
tricity from heat. If he did, much of insights of todays 
thermodynamics might be lost or delayed |25|." 

5 Unprovable Conjectures 

In the early twentieth century, the famous mathemati- 
cian, David Hilbert, proposed the idea that all mathe- 
matical facts can be derived from only a handful of self- 
evident axioms. In the 1930's, Kurt Godel proved that 
such a scenario is impossible by showing that for any 
proposed finite axiom system for arithmetic, there must 
always exist true statements that are unprovable within 
the system, if one is to assume that the axiom system has 
no inconsistencies. Alan Turing extended this result to 
show that it is impossible to design a computer program 
which can determine whether any other computer pro- 
gram will eventually halt. In the latter half of the 20th 
century, Gregory Chaitin defined a real number between 
zero and one, which he calls £1, to be the probability that 
a computer program halts. And Chaitin proved that: 

Theorem 1 - For any mathematics problem, the bits of 
f2, when O is expressed in binary, completely determine 
whether that problem is solvable or not. 

Theorem 2 - The bits of f2 are random and only a finite 
number of them are even possible to know. 

From these two theorems, it follows that the very struc- 
ture of mathematics itself is random and mostly unknow- 
able! [H| 

Even though Hilbert 's dream to be able derive every 
mathematical fact from only a handful of self-evident ax- 
ioms was destroyed by Godel in the 1930's, this idea has 
still had an enormous impact on current mathematics 
research |43| . In fact, even though mathematicians as 
of today accept the incompleteness theorems proven by 
Godel, Turing, and Chaitin as true, in general these same 
mathematicians also believe that these incompleteness 
theorems ultimately have no impact on traditional math- 
ematics research, and they have thus adopted Hilbert 's 



paradigm of deriving mathematical facts from only a 
handful of self-evident axioms as a practical way of re- 
searching mathematics. Gregory Chaitin has been warn- 
ing these mathematicians for decades now that these 
incompleteness theorems are actually very relevant to 
advanced mathematics research, but the overwhelming 
majority of mathematicians have not taken his warn- 
ings seriously 0. We shall directly confirm Chaitin's 
assertion that incompleteness is indeed very relevant to 
advanced mathematics research by giving very strong 
evidence that two famous mathematics problems, deter- 
mining whether the Collatz 3n+l Conjecture is true and 
determining whether the Riemann Hypothesis is true, 
are impossible to solve: 

The Collatz 'in + 1 Conjecture - Here's a fun exper- 
iment that you, the reader, can try: Pick any positive 
integer, n. If n is even, then compute n/2 or if n is odd, 
then compute (3n + l)/2. Then let n equal the result of 
this computation and perform the whole procedure again 
until n= 1. For instance, if you had chosen n = 11, you 
would have obtained the sequence (3 x 11 + l)/2 = 17, 
(3 x 17 + l)/2 = 26, 26/2 = 13, 20, 10, 5, 8, 4, 2, 1. 

The Collatz 3n+ 1 Conjecture states that such an al- 
gorithm will always eventually reach n = 1 and halt 23 . 
Computers have verified this conjecture to be true for all 
positive integers less than 2 24 x 2 50 w 2.52 x 10 17 
Why does this happen? One can give an informal argu- 
ment as to why this may happen as follows: Let us 
assume that at each step, the probability that n is even 
is one-half and the probability that n is odd is one-half. 
Then at each iteration, n will decrease by a multiplica- 
tive factor of about (f ) 1 ^ 2 (|) 1 ^ 2 = (f) 1 ^ 2 on average, 
which is less than one; therefore, n will eventually con- 
verge to one with probability one. But such an argument 
is not a rigorous mathematical proof, since the probabil- 
ity assumptions that the argument is based upon are not 
well-defined and even if they were well-defined, it would 
still be possible (although extremely unlikely, with prob- 
ability zero) that the algorithm will never halt for some 
input. 

Is there a rigorous mathematical proof of the Col- 
latz 3n + 1 Conjecture? As of today, no one has found 
a rigorous proof that the conjecture is true and no one 
has found a rigorous proof that the conjecture is false. 
In fact, Paul Erdos, who was one of the greatest math- 
ematicians of the twentieth century, commented about 
the Collatz 3n + 1 Conjecture: "Mathematics is not yet 
ready for such problems |23|." We can informally demon- 
strate that there is no way to deductively prove that the 
conjecture is true, as follows: 

Explanation: First, notice that in order to be certain 
that the algorithm will halt for a given input n, it is nec- 
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essary to know whether the value of n at the beginning 
of each iteration of the algorithm is even or odd. (For 
a rigorous proof of this, see "The Collatz Conjecture is 
Unprovable" ^H].) For instance, if the algorithm starts 
with input n = 11, then in order to know that the al- 
gorithm halts at one, it is necessary to know that 11 is 
odd, (3x11 + l)/2 = 17 is odd, (3 x 17 + l)/2 = 26 
is even, 26/2 = 13 is odd, 20 is even, 10 is even, 5 is 
odd, 8 is even, 4 is even, and 2 is even. We can ex- 
press this information (odd, odd, even, odd, even, even, 
odd, even, even, even) as a vector of zeroes and ones, 
(1, 1, 0, 1, 0, 0, 1, 0, 0, 0). Let us call this vector the parity 
vector of n. (If n never converges to one, then its par- 
ity vector must be infinite-dimensional.) If one does not 
know the parity vector of the input, then it is impossible 
to know what the algorithm does at each iteration and 
therefore impossible to be certain that the algorithm will 
converge to one. So any proof that the algorithm applied 
to n halts must specify the parity vector of n. Next, let 
us give a definition of a random vector: 

Definition - We shall say that a vector x G {0, l} m is 
random if x cannot be specified in less than m bits in a 
computer text-file 0. 

Example 1 - The vector of one million concatenations 
of the vector (0, 1) is not random, since we can specify it 
in less than two million bits in a computer text-file. (We 
just did.) 

Example 2 - The vector of outcomes of one million 
coin-tosses has a good chance of fitting our definition of 
"random" , since much of the time the most compact way 
of specifying such a vector is to simply make a list of the 
results of each coin-toss, in which one million bits are 
necessary. 

Now let us suppose that it were possible to prove the 
Collatz in + 1 Conjecture and let B be the number of 
bits in a hypothetical computer text-file containing such 
a proof. And let (xq,x\,X2, ...,xb) be a random vector, 
as defined above. (It is not difficult to prove that at least 
half of all vectors with B + 1 zeroes and ones are ran- 
dom 0.) There is a mathematical theorem [23 which 
says that there must exist a number n with the first 
B + l bits of its parity vector equal to (xq, x±, Xi, xb)', 
therefore, any proof of the Collatz 3n + 1 Conjecture 
must specify vector (xq,Xi,X2, ...,xb) (as we discussed 
above), since such a proof must show that the Collatz 
algorithm halts when given input n. But since vector 
(xq,xi,X2, ...,xb) is random, B + l bits are required to 
specify vector (xo, x%, x%, xb), contradicting our as- 
sumption that B is the number of bits in a computer 



text-file containing a proof of the Collatz in + 1 Con- 
jecture; therefore, a formal proof of the Collatz 3n + 1 
Conjecture cannot exist JSj. □ 

The Riemann Hypothesis - There is also another fa- 
mous unresolved conjecture, the Riemann Hypothesis, 
which has a characteristic similar to that of the Collatz 
in + 1 Conjecture, in that it too can never be proven 
true. In the opinion of many mathematicians, the Rie- 
mann Hypothesis is the most important unsolved prob- 
lem in mathematics |13) . The reason why it is so im- 
portant is because a resolution of the Riemann Hypoth- 
esis would shed much light on the distribution of prime 
numbers: It is well known that the number of prime 
numbers less than n is approximately J 2 ™ If the 

Riemann Hypothesis is true, then for large n, the error 
in this approximation must be bounded by cn 1 / 2 log n 
for some constant c > which is also a bound for 
a random walk, i.e., the sum of n independent random 
variables, Xk, for k = 1,2,..., n in which the probabil- 
ity that Xk = — c is one-half and the probability that 
Xk = c is one-half. 

The Riemann-Zeta function £(s) is a complex func- 
tion which is defined to be £(s) = — s ^ffi dx 
when the real part of the complex number s is positive. 
The Riemann Hypothesis states that if p = a + ti is a 
complex root of £ and < a < 1, then a = 1/2. It is well 
known that there are infinitely many roots of £ that have 
< a < 1. And just like the Collatz 3n + 1 Conjecture, 
the Riemann Hypothesis has been verified by high-speed 
computers - for all \t\ < T where T w 2.0 x 10 20 
But it is still unknown whether there exists a \t\ > T 
such that £(cr + ti) = 0, where a ^ 1/2. And just like 
the Collatz 3n + 1 Conjecture, one can give a heuristic 
probabilistic argument that the Riemann Hypothesis is 
true (TJ|, as follows: 

It is well known that the Riemann Hypothesis follows 
from the assertion that for large n, M(n) — EJJ =1 /i(fc) is 
bounded by cn 1 / 2 logn for some constant c > 0, where fj, 
is the Mobius Inversion function defined on N in which 
/i(fc) = — 1 if k is the product of an odd number of dis- 
tinct primes, /z(fc) = 1 if fc is the product of an even 
number of distinct primes, and /i(fc) = otherwise. If 
we were to assume that M(n) is distributed as a random 
walk, which is certainly plausible since there is no appar- 
ent reason why it should not be distributed as a random 
walk, then by probability theory, M(n) is bounded for 
large n by cn 1 / 2 logn for some constant c > 0, with 
probability one; therefore, it is very likely that the Rie- 
mann Hypothesis is true. We shall now explain why the 
Riemann Hypothesis is unprovable, just like the Collatz 
'in + 1 Conjecture: 

Explanation: The Riemann Hypothesis is equivalent to 
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the assertion that for each T > 0, the number of real 
roots t of £(1/2 + ti), where < t < T, is equal to 
the number of roots of £(s) in {s = a + ti : < a < 
1, < t < T}, It is well known that there exists a 
continuous real function Z(t) (called the Riemann-Siegel 
function) such that \Z(t)\ — |£(l/2+ti)|, so the real roots 
i of £(1/2 + ti) are the same as the real roots t of Z(t). 
(The formula for Z(t) is ((1/2 + ti)e M ^\ where i?(t) = 
arg[r(i + — i<ln7r.) Then because the formula 
for the real roots t of ^(1/2 + ti) cannot be reduced to a 
formula that is simpler than the equation, £(l/2+ti) = 0, 
the only way to determine the number of real roots t of 
£(1/2 + ti) in which < t < T is to count the changes 
in sign of the real function Z(t), where < t < T |31|. 

So in order to prove that the number of real roots t 
of £(1/2 + ti), where < t < T, is equal to the number 
of roots of £(s) in {s = a + ti : < a < 1, < t < T}, 
which can be computed via a theorem known as the Ar- 
gument Principle without counting the changes in sign 
of Z(t), where < t < T |27H3T] 133]. it is necessary 
to count the changes in sign of Z(t), where < t < T. 
(Otherwise, it would be possible to determine the num- 
ber of real roots t of £(1 /2+ti) , where < t < T, without 
counting the changes in sign of Z(t) by computing the 
number of roots of £(s) in {s = a + ti : < a < 1, < 
t < T} via the Argument Principle.) As T becomes arbi- 
trarily large, the time that it takes to count the changes 
in sign of Z(t), where < t < T, approaches infinity 
for the following reasons: (1) There are infinitely many 
changes in sign of Z(t). (2) The time that it takes to 
evaluate the sign of Z(t) approaches infinity as t — » oo 
|31| . Hence, an infinite amount of time is required to 
prove that for each T > 0, the number of real roots t of 
£(1/2 + ti), where < t < T, is equal to the number of 
roots of £(s) in {s = a + ti : < a < 1, < t < T} 
(which is equivalent to proving the Riemann Hypothe- 
sis), so the Riemann Hypothesis is unprovable. □ 

Chaitin's incompleteness theorem implies that math- 
ematics is filled with facts which are both true and un- 
provable, since it states that the bits of £1 completely 
determine whether any given mathematics problem is 
solvable and only a finite number of bits of fl are even 
knowable [Sj. And we have shown that there is a very 
good chance that both the Collatz 3n + l Conjecture and 
the Riemann Hypothesis are examples of such facts. Of 
course, we can never formally prove that either one of 
these conjectures is both true and unprovable, for ob- 
vious reasons. The best we can do is prove that they 
are unprovable and provide computational evidence and 
heuristic probabilistic reasoning to explain why these two 
conjectures are most likely true, as we have done. And 
of course, it is conceivable that one could find a counter- 
example to the Collatz 3n + 1 Conjecture by finding a 



number n for which the Collatz algorithm gets stuck in 
a nontrivial cycle or a counter-example to the Riemann 
Hypothesis by finding a complex root, p = a + ti, of £ 
for which < a < 1 and a ^ 1/2. But so far, no one has 
presented any such counter-examples. 

The theorems that the Collatz 3n + 1 Conjecture 
and the Riemann Hypothesis are unprovable illustrate 
a point which Chaitin has been making for years, that 
mathematics is not so much different from empirical sci- 
ences like physics [HI Ej. For instance, scientists uni- 
versally accept the law of gravity to be true based on 
experimental evidence, but such a law is by no means 
absolutely certain - even though the law of gravity has 
been observed to hold in the past, it is not inconceivable 
that the law of gravity may cease to hold in the future. 
So too, in mathematics there are conjectures like the 
Collatz 3n + 1 Conjecture and the Riemann Hypothesis 
which are strongly supported by experimental evidence 
but can never be proven true with absolute certainty. 

6 Computational Irreducibility 

Up until the last decade of the twentieth century, the 
most famous unsolved problem in all of mathematics was 
to prove the following conjecture: 

Fermat's Last Theorem (FLT) - When n > 2, the 
equation x n +y n = z n has no nontrivial integer solutions. 

After reading the explanations in the previous section, a 
skeptic asked the author what the difference is between 
the previous argument that the Collatz 3n+l Conjecture 
is unprovable and the following argument that Fermat's 
Last Theorem is unprovable (which cannot possibly be 
valid, since Fermat's Last Theorem was proven by Wiles 
and Taylor in the last decade of the twentieth century 

EH): 

Invalid Proof that FLT is unprovable: Suppose that we 
have a computer program which computes x n + y n — z n 
for each x,y,z G Z and n > 2 until it finds a nontrivial 
(x, y, z, n) such that x n + y' 1 — z n = and then halts. 
Obviously, Fermat's Last Theorem is equivalent to the 
assertion that such a computer program can never halt. 
In order to be certain that such a computer program will 
never halt, it is necessary to compute x n +y n — z n for each 
x, y, z G Z and n > 2 to determine that x n + y n — z n ^ 
for each nontrivial (x,y,z,n). Since this would take 
an infinite amount of time, Fermat's Last Theorem is 
unprovable. □ 

This proof is invalid, because the assertion that "it is 
necessary to compute x n + y n — z n for each x,y,z G Z 
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and n > 2 to determine that x n + y n — z n ^ for each 
nontrivial (x, y, z, n)" is false. In order to determine that 
an equation is false, it is not necessary to compute both 
sides of the equation - for instance, it is possible to know 
that the equation 6x + 9y — 74 has no integer solutions 
without evaluating 6x + 9y for every x, y G Z, since one 
can see that if there were any integer solutions, the left- 
hand-side of the equation would be divisible by three but 
the right-hand-side would not be divisible by three. 

Question - So why can't we apply this same reasoning 
to show that the proof that the Collatz 3n+l Conjecture 
is unprovable is invalid? Just as it is not necessary to 
compute x n + y n — z n in order to determine that x n + 
y n — z n ^ 0, is it not possible that one can determine 
that the Collatz algorithm will converge to one without 
knowing what the algorithm does at each iteration? 

Answer - Because what the Collatz algorithm does at 
each iteration is what determines whether or not the 
Collatz sequence converges to one ^5], it is necessary 
to know what the Collatz algorithm does at each itera- 
tion in order to determine that the Collatz sequence con- 
verges to one. Because the exact values of x n + y n — z n 
are not relevant to knowing that x n +y n — z n ^ for each 
nontrivial (x, y, z, n), it is not necessary to compute each 
x n +y n — z n in order to determine that x n + y n — z n ^ 
for each nontrivial (x, y, z, n). 

Exercise - You are given a deck of n cards labeled 
1,2,3, ...,n. You shuffle the deck. Then you perform 
the following "reverse-card-shuffling" procedure: Look 
at the top card labeled k. If k = 1, then stop. Other- 
wise, reverse the order of the first k cards in the deck. 
Then look at the top card again and repeat the same 
procedure. For example, if n = 7 and the deck were in 
order 5732416 (where 5 is the top card), then you would 
obtain 4237516 -► 7324516 -» 6154237 -» 3245167 -» 
4235167 -> 5324167 -> 1423567. Now, we present two 
problems: 

• Prove that such a procedure will always halt for 
any n and any shuffling of the n cards. 

• Find a closed formula for the maximum number 
of iterations that it may take for such a proce- 
dure to halt given the number of cards in the deck, 
or prove that no such formula exists. (The maxi- 
mum number of iterations for n = 1, 2, 3, 16 are 
0,1,2,4,7,10,16,22,30,38,51,65,80,101,113,139 [H].) 

It is easy to use the principle of mathematical induction 
to solve the first problem. As for the second problem, it 
turns out that there is no closed formula; in other words, 
in order to find the maximum number of iterations that 



it may take for such a procedure to halt given the num- 
ber of cards n in the deck, it is necessary to perform the 
reverse-card-shuffling procedure on every possible per- 
mutation of 1,2,3, ...,n. This property of the Reverse- 
Card-Shuffling Problem in which there is no way to de- 
termine the outcome of the reverse-card-shuffling proce- 
dure without actually performing the procedure itself is 
called computational irreducibility |42| . Notice that the 
notion of computational irreducibility also applies to the 
Collatz 3n + 1 Conjecture and the Riemann Hypothesis 
in that an infinite number of irreducible computations 
are necessary to prove these two conjectures. 

Stephen Wolfram, who coined the phrase "computa- 
tional irreducibility" , argues in his famous book, A New 
Kind of Science |42| . that our universe is computation- 
ally irreducible, i.e., the universe is so complex that there 
is no general method for determining the outcome of a 
natural event without either observing the event itself or 
simulating the event on a computer. The dream of sci- 
ence is to be able to make accurate predictions about our 
natural world; in a computationally irreducible universe, 
such a dream is only possible for very simple phenom- 
ena or for events which can be accurately simulated on 
a computer. 

7 Open Problems in Mathemat- 
ics 

In the present year of 2006, the most famous unsolved 
number theory problem is to prove the following: 

Goldbach's Conjecture - Every even number greater 
than two is the sum of two prime numbers. 

Just like the Collatz 3n+ 1 Conjecture and the Riemann 
Hypothesis, there are heuristic probabilistic arguments 
which support Goldbach's Conjecture, and Goldbach's 
Conjecture has been verified by computers for a large 
number of even numbers [20]. The closest anyone has 
come to proving Goldbach's Conjecture is a proof of the 
following: 

Chen's Theorem - Every sufficiently large even integer 
is either the sum of two prime numbers or the sum of a 
prime number and the product of two prime numbers 

M 

Although the author cannot prove it, he believes the 
following: 

Conjecture 1 - Goldbach's Conjecture is unprovable. 
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Another famous conjecture which is usually mentioned 
along with Goldbach's Conjecture in mathematics liter- 
ature is the following: 

The Twin Primes Conjecture - There are infinitely 
many prime numbers p for which p + 2 is also prime |2l)j . 

Just as with Goldbach's Conjecture, the author cannot 
prove it, but he believes the following: 

Conjecture 2 - The Twin Primes Conjecture is undc- 
cidable, i.e., it is impossible to know whether the Twin 
Primes Conjecture is true or false. 

8 Conclusion 

The P 7^ NP problem, the Collatz 3n + 1 Conjecture, 
and the Riemann Hypothesis demonstrate to us that as 
finite human beings, we are all severely limited in our 
ability to solve abstract problems and to understand our 
universe. The author hopes that this observation will 
help us all to better appreciate the fact that there are 
still so many things which G-d allows us to understand. 
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